Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Honor DAP-Auth-Token or Authorization: Bearer #1306

Merged
merged 1 commit into from
Apr 27, 2023

Conversation

tgeoghegan
Copy link
Contributor

When acting as an HTTP server (i.e., helpers handling aggregation job or aggregate share messages, or leaders handling collection requests), Janus will now check for the authentication token in the more standard Authorization: Bearer <token> format defined in RFC 6750, as it already does in the aggregator API. If that header is absent, it falls back to checking DAP-Auth-Token for compatibility with existing peers. Nothing changes about the format or handling of the authentication token itself, which remains an opaque blob o' bytes.

Part of #472

@tgeoghegan tgeoghegan requested a review from a team as a code owner April 27, 2023 00:08
core/src/task.rs Outdated
Ok(format!(
"Bearer {}",
str::from_utf8(self.as_ref())
.context("failed to covert authentication token to UTF-8")?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should update aggregator_core/src/task.rs:<Task as TryFrom<SerializedTask>>::try_from() to check that authentication tokens are valid UTF-8. Currently it only checks that they are valid HTTP header values. (alternately, we could construct a Vec<u8> here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm now taking a look at the places where we generate and store auth tokens across the aggregator API and the Janus API and TBH it's a bit of a mess of (unintentional?) double-base64-encoding. I'm going to take a beat to try to untangle this and separate what an auth token is (opaque blob of bytes) vs. how it's represented for transport via HTTP headers or k8s secrets or YAML or environment variables.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of AuthenticationToken was that it stored the token bytes directly, rather than an additionally-encoded form of the token. The reason I have this understanding is that the code that reads authentication tokens out of the DB places the bytes it reads directly into an AuthenticationToken with no further encoding/decoding, and there is no good reason for the DB to store an encoded form of the token.

For that reason, I think we want to base64-encode here.

Copy link
Contributor Author

@tgeoghegan tgeoghegan Apr 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I agree, and symmetrically, extract_bearer_token must base64 decode, since this particular layer of base64 encoding is a detail of how we transport the token in the context of an authorization: Bearer header.

Where this gets confusing is that the impl Distribution<AuthenticationToken> for Standard constructs AuthenticationTokens by generating 16 random bytes, then encodes that in Base64URL-no-padding, then the bytes of that string are the contents of the AuthenticationToken. So this worked because the bytes of the authentication token were themselves a valid UTF-8 and Base64 string. I'm not sure if it makes sense to generate AuthenticationToken that way (at a minimum it's a less efficient way to store 16 bytes of entropy) but that's orthogonal to the goals of this change, so I won't mess with it.

Bonus: that makes this method infallible.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah -- I think the reason we made that choice was because the semantics of DAP-Auth-Token are "there is a correct token that must match for authz to pass, we pass it back and forth over an HTTP header" (which implies it must fit into an HTTP header, i.e. it is ASCII). So we chose to generate a random token by base64-encoding some random bytes, which is a handy way to create a token with a given amount of entropy that fits the ASCII requirement -- but in this case, the actual token is the base64-encoded string, not the underlying bytes that were randomly generated. And any process that could encode an ASCII string from an arbitrary byte buffer would work equally well, e.g. we could have hex-encoded, base32-encoded, etc.

Now that we're using Authorization: Bearer, the semantics are defined somewhat differently: we must send data that is at least shaped like base64-encoded data (RFC). Since our authorization tokens are actually arbitrary byte strings, we need to base64-encode here.

Once we have migrated away from supporting DAP-Auth-Token at all, or refactored things such that we can generate auth tokens per-authorization-method, we can change our encoding. It's a bit unfortunate we defined DAP-Auth-Token so loosely--maybe we should change our encoding scheme to keep things from getting confusing in practice? Or maybe we should support receiving auth tokens on a per-authorization-scheme basis, to avoid locking ourselves into the pattern that every authorization scheme must work with the same set of arbitrary bytes (or, alternatively, that our generated tokens must fit the requirements of every authorization scheme)?

Copy link
Member

@branlwyd branlwyd Apr 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I'm going to send a PR to change randomly-generated authorization tokens to be shaped like hex-encoded data rather than base64-encoded data, as the current situation is going to get very confusing as we implement authorization schemes that require base64-encoding the bearer token.)

edit: #1308

if let Some(received_token) = authorization_value.as_ref().strip_prefix(b"Bearer ") {
return Ok(Some(AuthenticationToken::from(received_token.to_vec())));
} else {
return Err(anyhow!("authorization header value is not a bearer token"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code could be simplified if Authorization header values with the wrong prefix were treated the same as missing headers, and this returned None in both cases. Some of the call sites already treat both outcomes the same, while others have two different paths that ultimately result in 403 status codes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I agree: it's important to distinguish between the case where the authorization header is absent (in which case we want janus_aggregator to fall back to DAP-Auth-Token) and the case where the header is present but malformed (in which case we don't want to risk concealing a bug in the requestor by falling back to DAP-Auth-Token). I do notice I'm missing a test to verify that presenting a malformed authorization header causes a failure

@tgeoghegan tgeoghegan force-pushed the timg/authn-accept-either-token branch from 1bb53ff to 07b68a9 Compare April 27, 2023 17:43
core/src/task.rs Outdated
Ok(format!(
"Bearer {}",
str::from_utf8(self.as_ref())
.context("failed to covert authentication token to UTF-8")?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding of AuthenticationToken was that it stored the token bytes directly, rather than an additionally-encoded form of the token. The reason I have this understanding is that the code that reads authentication tokens out of the DB places the bytes it reads directly into an AuthenticationToken with no further encoding/decoding, and there is no good reason for the DB to store an encoded form of the token.

For that reason, I think we want to base64-encode here.

}
};

let decoded = match STANDARD.decode(bearer_token) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be removed now that extract_bearer_token() does base64 decoding

test_case
}

async fn setup_aggregate_init_test_stage_1() -> AggregationJobInitTestCase {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q: what does stage_1 mean here? could this be given a better name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I renamed it to clarify its purpose.

When acting as an HTTP server (i.e., helpers handling aggregation job or
aggregate share messages, or leaders handling collection requests),
Janus will now check for the authentication token in the more standard
`Authorization: Bearer <token>` format defined in RFC 6750, as it
already does in the aggregator API. If that header is absent, it falls
back to checking `DAP-Auth-Token` for compatibility with existing peers.
Nothing changes about the format or handling of the authentication token
itself, which remains an opaque blob o' bytes.

Part of #472
@tgeoghegan tgeoghegan force-pushed the timg/authn-accept-either-token branch from e3d70bd to bc8cf98 Compare April 27, 2023 21:29
@tgeoghegan tgeoghegan merged commit d3594e8 into main Apr 27, 2023
@tgeoghegan tgeoghegan deleted the timg/authn-accept-either-token branch April 27, 2023 22:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants